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Abstract. 


The  paper  addresses  the  performance  of  square  elements  of  type 
Q(p)  and  Q'(p).  (The  Q(p)  resp.  Q'(p)  are  elements  of  degree  p  analo¬ 
gous  to  the  well  known  9  and  8  noded  elements  for  p  «  2. )  The  performance  is 
analyzed  theoretically  for  the  class  of  analytic  functions.  Numerical  experi¬ 
ments  confirm  the  conclusions  drawn  from  the  theory.  The  computational  com¬ 
plexity  of  a  solution  algorithm  is  studied  using  timings  of  the  computation  on 
an  Alliant  FX/8  computer.  The  data  show  that  high  order  elements  are 
preferable. 

1 .  Introduction. 

The  difference  between  the  performance  of  the  8  noded  and  9  noded  quadri¬ 
lateral  element  has  been  directly  and  indirectly  addressed  in  various  contexts 
in  the  literature.  The  8  noded  element  is  sometimes  called  the  serendipity 
element.  In  the  mathematical  literature  [1]  the  9  noded  element  is  denoted  as 
the  Q(2)  element  while  the  8  noded  element  is  denoted  as  the  Q'(2)  element. 
Analogously,  we  can  define  elements  Q(p)  and  Q'(p)  for  the  general  degree 
p  £  2.  As  we  will  see  in  Section  2.  the  Q'(p)  element  is  the  element  with 
minimal  number  of  internal  shape  functions,  while  in  Q(p)  additional  inter¬ 
nal  shapes  are  present.  Of  course,  in  general  we  can  have  less  or  more  inter¬ 
nal  shape  functions,  in  principle  up  to  an  infinite  number  of  them.  A  natural 
question  arises  about  optimal  selection  of  the  number  of  internal  shape  func¬ 
tions.  This  question  is  especially  important  in  the  context  of  the  h-p  ver¬ 
sion  of  the  finite  element  method  and  its  adaptive  features. 

The  performance  of  the  Q(p)  and  Q'(p)  elements  (and  others,  which 
differ  by  the  nximber  of  the  internal  shape  functions)  can  be  compared  from 
various  points  of  view.  In  [2],  [3]  we  addressed  in  detail  the  differences 
among  various  aspects  of  implementation  on  parallel  computers.  In  [2],  [4]  we 
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addressed  the  question  of  the  performance  of  Iterative  procedures.  Here,  in 
this  paper  we  will  be  interested  primarily  in  the  question  of  the  approxima¬ 
tion  properties  (in  the  H^-seminorm)  of  the  Q(p)  auid  Q'(p)  elements  and 
their  computational  effectivity  for  achieving  prescribed  accuracy. 

The  question  of  which  element  is  preferable  cannot  be  answered  in  general 
(see  Section  8).  This  question  can  be  addressed  only  in  relation  to  a  class 
of  functions  to  be  approximated.  This  class  has  to  be  adequate  to  the  prob¬ 
lems  solved  in  practice.  In  this  paper  we  consider  the  class  of  analytic 
functions.  This  class  is  very  natural  for  static  structural  mechanics  prob¬ 
lems  where  the  solutions  satisfy  an  elliptic  differential  equation  with  piece- 
wise  analytic  right  hand  side  and  boundary  conditions  on  a  domain  12  with 
piecewise  analytic  boundary.  The  performance  of  the  Q(p)  and  Q'(p)  ele¬ 
ments  will  be  related  to  distance  of  the  element  to  the  boundary  of  the  ana¬ 
lytic!  ty  domain  of  the  approximated  function. 

Obviously  the  Q'(p)  element  has  a  smaller  number  of  the  shape  func¬ 
tions.  Hence,  we  can  ask  what  is  the  smallest  number  K  >  1  such  that  the 
Q'(Kp)  element  yields  a  better  approximation  than  the  Q(p)  element  for  a 
particular  approximated  function  (or  class  of  functions).  The  number  K  can 
be  used  as  a  natural  comparison  index.  We  can  use  this  index  together  with 
some  other  complexity  indicator  to  assess  the  computational  performance.  For 
example  in  [2] ,  [3]  we  have  used  the  value  H  *  Vz  when  we  analyzed  the  com¬ 
plexity  of  parallel  computations. 

In  this  paper  we  will  present  a  theoretical  upper  estimate  for  the  index 
K  when  the  approximation  is  measured  in  the  H^-seminorm.  Further,  we  will 
present  the  results  of  various  numerical  experiments  on  a  model  example.  In 
additions,  some  comparisons  between  quadrilateral  auid  triangular  elements  will 
be  given,  as  well  the  relation  between  the  mesh  sizes  and  the  degree  of  ele- 
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meats  leading  to  the  minimal  computational  cost  for  a  given  requested 
accuracy. 


2.  The  Q(p)  and  Q'(p)  elements. 


Denote  by  S  =  {|x|  <  1/2,  lyl  <  1/2}  the  unit  square  and  by  T, , 

4 

1,2, 3, 4  its  sides,  as  indicated  in  Figure  2.1;  let  F  =  U  F ,=  an 

-  1=1  ^ 


i 


Figure  2.1.  Scheme  of  the  square  domain. 

In  contrast  to  [1],  we  define  the  spaces  Q(p)  and  Q'(p)  via  listing 
the  nodal,  side  and  internal  shape  functions  as  introduced  in  [5]. 
a)  The  Q(p)  elements.  Here  we  define 

a)  the  nodal  shape  functions  associated  with  the  nodes  N^: 


Ni  : 

dj(x,y)  « 

(1/2  +  x)(l/2  +  y). 

Nz  : 

^2(x,y)  « 

{l/2-x}(l/2  +  y). 

M3  : 

^^(x,y)  » 

(l/2  +  x)(l/2-y). 

N4  : 

^4(x,y)  » 

(1/2  -  x)(l/2  -  y). 
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b)  the  side  shape  functions  associated  with  the  sides  r^: 


'^1  ’  Pj(y)(l/2  +  x). 

J  =  2,3,.. 

.  >P> 

j(x.y)  *  P^(x)(l/2  +  y). 

J  =  2,3,.. 

.  ,P. 

^3  = 

*  Pj(y)(l/2  -  x). 

J  =  2,3,.. 

.  .P. 

^4  = 

^4  *  Pj(x)(l/2  -  y). 

J  “  2,3,.. 

•  »  P  • 

Here  P.Cx)  Is  a  polynomial  of  degree  J  such  that  P  (±1/2)  =  0. 

^  ^  ^ 
Specifically  we  use,  Lj_j(t)dt,  J  =  2,3,...,p 

J-l 


(t)dt,  J  =  2,3,...,p  where 


Lj(t)  is  the  Legendre  polynomial  of  degree  J.  For  more  see  e.g.,  [5]. 
c)  The  internal  shape  functions  for  the  Q(p)  element: 


(2.3)  Pj  j(x,y)  *  Pj(x)  Pj(y)  l,J*2,...,p. 

The  space  of  the  Q(p)  elements  is  the  span  of  its  nodal,  side  and  internal 
shape  functions. 

^)  The  Q'(p)  elements.  Here  the  tuxial  and  side  shape  functions  are 
the  same  as  in  the  family  Q(p)  l.e.,  they  are  given  by  (2.1)  auid  (2.2). 
c')  The  internal  shape  functions  for  the  Q'(p)  element: 


Pl,j(x.y)  »  Pj(x)  Pj(y).  l.J,  *  2,  i+J  s  p. 

The  space  of  the  Q'(p)  elements  is  the  span  of  its  nodal,  side  and  shape 
functions. 

The  Q(p)  and  Q'(p)  elements  can  be  described  by  listing  all  of  the 
monomials  belonging  to  their  space.  We  can  visualize  them  in  the  Pascal  table 
of  Figure  2.2. 
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Figure  2.2.  The  Pascal  table  for  the  Q(p)  element. 

Every  bullet  In  the  Pascal  table  depicts  the  term  x'^y^  '^.  The  Q(p)  element 
Is  then  associated  with  the  set  of  the  bullets  In  the  dlaunond  area  shown  In 
Figure  2.2.  The  value  of  p  Is  shown  there  too.  Analogously  we  depict  the 
Pascal  table  for  the  Q'(p)  element  (see  Figure  2.3). 


.01234 

y  ^  y  y 


FlgiU'e  2.3.  The  Pascal  table  for  the  Q'(p)  element. 

The  Q'(p)  space  Is  the  smallest  space  of  polynomials  Including  the  p>olyno- 
mlals  of  total  degree  p  and  the  side  functions  which  are  polynomials  of 
degree  p  on  one  side  and  zero  on  the  three  others. 
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It  is  easy  to  compute  the  dimension  of  the  spaces  Q(p)  and  Q'(p).  We 


have 


Dimension  of  Q(p)  =  (p+1)' 
Dimension  of  Q'(p)  »  4 

-  8 


4P  ♦  <£±)  <p:31 


for  p  =  1 
for  p  “  2 
for  p  a  3 


In  the  next  section  we  will  also  employ  the  space  Q(p,q)  -  Q'(p>q)> 


with 


Q(p.q)  »  Q''(p)  «  Z(q) 

Z(q)  «  span{Pj(x)  Pjty)*  0  s  1,J  s  q} 


and 


Q'(p.q)  -  Q*(P)  ®  Z'(q) 

Z'(q)  -  span{P^(x)  Pj(y).  0  s  1+J  s  q> 

where  Q^Cp)  is  the  span  of  the  nodal  and  side  functions  only.  We  have 
Q(p,p)  ■  Q(p)  and  Q'(p,p)  »  Q'(p).  The  spaces  Z(q)  auid  Z'(q)  consist 
only  of  internal  shape  functions. 

We  also  introduce 

Q(p.«)  -  Q*(p)  e  hJ(S)  -  Q'(p,€.) 

where  H^CS)  is  the  standard  Sobolev  space  with  zero  traces  on  r. 


3.  The  model  problem. 
Let  X  +  iy  *  z  e 

(3.1)  w^®^z)  « 

o 

(a) 

The  function  w^  (z) 


C,  Zq  ■  jd+l),  a  >  1  and 


a^-(z+zo)*  a^+(z+Zo)^ 


J_^  J_ 

a^-1  a^+1 


is  •  holomorphic  function  of  the  complex  variable 


z 
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on  S  »  {|x|  <  1/2,  lyl  <  l/2>.  Denote 


(3.2) 


(a)  .  (a) 

Vq  -IdiWq  ; 


C31 )  C  31 ) 

then  Uq  and  are  harmonic  on  S. 


Let  H^(S)  be  the  real  (or  complex)  Sobolev  space.  For  u,v  e  H^(S) 


let 

(3.3) 

(3.4) 


[u.v]  *  r  (Vu  •  Vv)  dxdy 

|u|^  =  J  |7u|^  dxdy 


be  the  scalar  product  and  seminorm  In  H  (S).  From  the  Cauchy-Riemann  condi¬ 
tion  we  have 


and 

(3.5) 


Let  us  consider  the  Neumann  problem  for  the  Laplace  equation  on  5 


(3.6) 


-Au  “  0 


on  S 


(3.7) 


du 


on  r 


.(a) 


where  g  is  such  that  the  solution  of  (3.6)  and  (3.7)  is  .  Because  the 

solution  u  of  (3.6)  and  (3.7)  is  determined  up  to  a  constant,  we  will  assume 

that  u(-l/2,  1/2)  ■  Uq®^(-1/2,  1/2).  Note  that  we  are  Interested  in  the 

seminorm  1*1,  and  hence  this  constant  is  not  essential. 

For  any  a  >  1,  D  ul  »  - -2 —  c  H'‘^(S)  for  any  multindex 

°  ax*»ay“2 

a  ■  (a^,a2)  a  0,  ®2  *  integers,  |a|  *  ♦  a2.  Nevertheless  as 

ft  ( 3l )  (si ) 

a— »1,  ID  u  I— and  hence  u^  becomes  less  smooth  as  a  decreases. 

(  ft  )  ™ 

The  function  u^  is  analytic  on  S  and  its  analyticity  domain  is  D  » 
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R  \{z  =  ±a  -  Zq,  ±la-ZQ>.  This  function  is  a  typical  representative  of  the 
solutions  of  elliptic  PDE  problems  in  two  dimensions.  In  practice  such  solu¬ 
tions  are  analytic  with  singularities  only  at  a  finite  number  of  points  on  the 
boundary  of  the  domain,  e.g.,  where  the  boundary  has  corner  or  the  boundary 
condition  type  is  changing. 

We  will  be  interested  in  the  performance  of  the  finite  element  method  for 

(  81 ) 

solving  (3.6)  and  (3.7)  where  the  exact  solution  is  u^  and  the  mesh  con- 

1  2 

sists  of  squares  with  sides  of  length  j  (i.e.,  S  is  divided  into  I 
squares ) . 

Let 

0'(p)  *  (u  €  Q'(p)|  u  is  harmonic  polynomial). 

Obviously  we  have 

(3.9)  Q(p.»)  b  Q(p) 


(3.10)  Q'(p)  b  Q'(p). 

We  denote  by  u^®^  (Q(p)  ) ,  u^®^  (Q'(p)  ,£) ,  Uq®\q(P,ob)  ,<) ,  u^®^  (Q'(p)  ,f) 

the  finite  element  solution  of  (3.6)  (3.7)  using  the  respective  Q(p)  and 

Q'(p)  element  etc.  on  the  mesh  consisting  by  squares.  The  finite  element 

(H  ) 

solution  is  the  projection  of  u^  on  the  ret  of  finite  element  functions. 

We  will  impose  the  constraint  at  (-1/2,  1/2)  that  the  finite  element  solution 

(  81 ) 

coincides  with  the  exact  solution  u^  to  get  uniqueness.  This  constraint 
does  not  influence  the  seminorm  1*1. 

Denote 


ti^®\q(p),/) 


and  analogously  (Q'(p)  ,f)  (Q(p,»)  ,^) ,  (Q(p)  ,£) .  For  any  a> 
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1  we  have 


and 

t^^®^Q'(2p).^)  s  ii^*^Q(p)£) 

Let 

(3.11)  -  inf  {<r.  <r  >  1  |  Ti^*^Q'(<rp) ,f)  s  u^®^Q(p).^)} 

The  Index  K^^^(p,£)  is  a  good  characterization  of  the  performance  of  the 
Q(p)  and  Q'(p)  elements  relative  to  the  function  (and  mesh  composed 

by  ^  elements) 

Let  us  further  define 

(3.12)  J^^Up.t)  -  inf  {<r,  <r>l  |  T»^^^Q'((rp).£)  s  D^*^Q(p, «),£)} 

r&)  ca)  (ai 

Obviously  7  (p.£)  a:  H  (p.^)  and  hence  7  (p,^)  is  an  upper  bound  for 

K^*^p.e). 

If  £  >  1  we  often  will  not  write  the  index  t.  For  example,  we  will 
write  i»^*^(Q(p))  instead  Ti^*^(Q(p),l),  7^®\p)  instead  7^®^(p,l),  etc. 


4.  Asymptotic  estimate  of  7^*^(p),  f  »  1. 

Let 

(4.1)  p[*^p)  -  inf  . 4 

^  fc»eP,  (p)  ® 

Here  P^(p)  is  the  set  of  all  polynomials  on  F^,  1  >  1,2, 3, 4,  of  degree 

p.  By  I  •  I  ■  inf  1^1,  A,  (p)  »  {#  I  is  harmonic  on  S  and 

H*'^(r,)  ^At(f)  ^ 

1/2 

if>  •  p  on  r^}  we  denote  the  standard  H  (F^)  seminorm  of  the  traces  on 
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Then  by  the  extension  theorem  we  have 

(4.2)  (Q(P,od)  )  fc  max  (p)]“  (p) . 

Isls4t  ‘  ) 

Further  we  have 

(4.3)  lul‘‘’|T|‘*’(a'(p))  a  Inf 

°  wsS'Cp)  ° 

where  Q*^(p)  Is  the  space  of  all  complex  polynomials  of  degree  s  p. 

Now  we  apply  the  classical  theory  of  approximation  of  functions  In 
complex  domain  [6].  First  we  will  consider  the  approximation  on  F^.  To  this 
end  let 

I  ■  {2  ■  X  +  ly,  lx|  s  1/2,  y  =  0}  c  C 

and 

n^=  {z  e  C,  2  «  I> 

The  function 

(4.4)  0(0  *  5  C  -  ?  +  iTj 

maps  the  7  =  (C  I  Kl  >  Di  l.e.,  the  outside  of  the  unit  circle  In  the 
C-plan®)  onto  Q^. 

Let  f(z)  be  a  holomorphlc  function  In  a  domain  Z)  c  C,  I  c  Z7  auid 
^(z)  Is  real  for  z  e  I.  Define  ♦(<)  »  fp(0(<)).  Then  #(<)  Is  a 
holomorphlc  function  In  an  annulus 

«  I  1  <  Kl  <  R> 

for  certain  R  which  depends  on  the  domain  of  analytlclty  Z)  of  the  function 
(p.  We  assume  that  largest  possible  annulus,  l.e.,  0(<)  Is  not 

holomorphlc  In  the  annulus  ®*iy  c  >  0. 

From  [6]  §5.2,  Theorem  3,  we  have  for  any  c  >  0  and  k  «  0,1 
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1/2 

(4.5)  C,(e)(R+e)"P  s  inf  ff  s  C_(e)  (R-c)"^ 

^  w€?«(p)  i  ^ 

where  Is  the  set  of  all  complex  polynomials  on  I  of  degree  s  p  amd 

C^(c),  ^2(0)  are  constants  Independent  of  p. 

Using  the  standard  interpolation  theory  in  Banach  spaces  [7]  we  get  from 

(4.5) 

(4.6)  C  (c)(R+c)“P  s  inf  Ip-w|  s  r  (c)  (R-e)"^ 

«€?>«(p)  h'^*(I)  ^ 

Here  )|^»(x,y)|  «  l<(i(x,  y+l/2)|  .  We  note  that  if  p(x)  is  real 

r  ^(I)  H*^(r2) 

on  I  then  we  can  replace  in  (4.5)  the  space  ?’*^(p)  of  complex  polynomials 
by  the  space  ^(p)  of  real  polynomials. 

For  1  »  1  let 

Wj(2)  «  Wj(x+iy)  -  i  (w(x+iy)  +  w(l-x+iy)  j , 

then  Wj(7)  is  a  holomorphic  function  In  a  domain  2)^  c  C,  Fj  c  2)  and 
Wj(z)  »  Uq(z)  for  z  €  Fj.  Analogously  we  define  Wj(z)  ,  i  »  2,3,4.  We 
use  now  (4.6)  for  the  functions  w^(z)  instead  of  p(z). 

Relation  (4.6)  now  leads  in  our  particular  case  immediately  to  the 
estimate 

(4.7)  Cj(c)(R^'^+c)‘P  a  M^*^p)  a  02(0) (R^^ ^-c)‘P 
where 

(4.8)  R^^^  -  min  (lCj|,lC2ll 
with 

^(Cj)  »  (a  -  1)  +  1/2 

^(<2)  “  I  ♦ 

Then  we  get  from  (4.2)  and  (4.7) 

(4.9)  M^*^p)  m  C(c)(R^^^+c)"P 
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Because  for  large  <  we  have 


(4.10) 


we  get  for  large  a 


(4.11)  tj^®^Q(p.«))  a  Cj(c)(4a  +  G)"P 

f  a  )  A 

We  will  now  consider  the  estimate  for  t)  (Q'(p)).  By  the  saune  argument 
as  above  let  i^(<)  denote  the  conformal  mapping  of  the  outside  of  the  unit 
circle  r  onto  the  outside  of  S.  Using  the  Schwarz  (Thrlstoffel  formula  (see 
[8],  (91)  we  get  with  Zq  »  Zq.  |Zq|  =  1 


(4.12) 


with  c  a  0.591  obtained  by  numerical  integration.  The  function  |^(<)  now 
plays  the  same  role  as  the  function  ^(^)  in  (4.4). 

Let 

(4.13)  »  min 

isis4 


where 

«  a  +  Zq 
»  -a  +  Zq 
^«^^^)  -  ia  +  Zq 
«  -ia  +  Zq 

Then  we  get  analogously  as  before 

(4.14)  i|^®^Q^(p))  s  C2(c)(R^^^-c)"P. 


Because  for  large  <  we  have 


^(C)  «  0.591  Z 
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we  get  for  large  a 


(4.15)  a  C2(c)(0.59l"^a  -  c)"^  *  C2{e)  (1.692a  -  e)"** 

Remark  4.1.  From  the  Llndelof  principle,  we  always  have  >  R^^^. 

From  (4.11).  (4.15).  and  (3.12)  we  see  that  for  large  a. 

(4.16)  y^®^(p)  a  p 
where 

(4.17)  (l,642a)f;  ,  ^ 

(4a)P 

( s ) 

emd  hence  for  large  a  and  p  we  get  ^  «  1  and  hence  also  H  »  1. 

On  the  other  hand  it  is  easy  to  see  that  for  small  a.  R^^^  -  1  » 

1/2  f21  2/*! 

0((a-l)  )  and  R^  -1  »  0((a-l)^'*).  Hence  for  small  a  P— >  e#  and  we  can 

expect  that  with  decreasing  a.  K  will  grow.  Of  course  always  If  a  2. 


5.  Numerical  examples.  Performance  of  the  Q(p)  and  Q*(p)  elements  In  the 
p-verslon  -  1*1. 

In  Figure  5.1  we  show  the  graphs  of  ij^*^(Q(p))  and  T}^®^(Q'(p))  as 

Csi) 

functions  of  a  and  p  in  the  scale  Ig  i)  x  p.  This  scale  is  associated 
with  the  expected  behavior  «  (R(a))  ^  in  which  case  we  would  get  a 

straight  line.  In  Table  5.1  we  give  the  values  of  R^^\  i  >  1.2,  with 
t)^®^Q(P,«))  (R^'^)"P  from  (4.8)  and  with  u^®^(0'(p))  *  (R^^^)~^  from 

(4.13). 
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Relative  error  in  % 


Degree  p 


Figure  5.1.  Performance  of  the  Q(p)  amd  Q'(p)  elements  for  various  a. 


Table  5.1.  The  rates  i  »  1,2 


a 

1.05 

1.39 

1.13 

1.10 

1.60 

1.25 

1.50 

2.81 

1.86 

2.00 

3.97 

2.38 

14 


C  & ) 

In  Figure  5.1  we  also  show  theoretical  slopes  for  i)  (Q(p.cs))  and 

(Q'(p) )  which  are  expected  approximations  of  the  rates  of  i)^^^(Q(p)) 

(  &  ) 

and  1)  (Q'(p))  respectively.  We  see  that  in  fact  the  theoretical  slope  of 

(Q(p,«) )  is  close  to  the  one  of  ii^*^(Q(p))  (but  is  slightly  larger)  and 
the  slope  of  (Q'(p) )  is  slightly  larger  than  that  of  (Q'(p) )  as 

should  be  expected. 


Figure  5.2a.  Comparison  of  the  performance  of  the  Q(p)  and  Q'(p) 

elements  for  a  *  1.05. 

r® 1  (a) 

Figure  5.23  shows  the  error  t)  (Q(p))  and  i}  (Q'(p))  on  a  Ig  i)  x 
Ig  p  scale  for  a  ■  1.05.  We  see  that  the  curves  are  nearly  parallel  and 

i»^®^Q'(p«r(p)))  -  i>^®\q(p)) 

where  for  small  p,  <r(p)  «  1.8  and  for  larger  p,  <r(p)  «  1.4.  Figure  5.2b 
shows  the  analogous  graphs  for  a  «  2.0.  Here  we  see  1.3  <  trCp)  <  1.6.  This 
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indicates,  as  the  analyses  support,  that  <r  decreases  as  a  increases, 
i.e.,  as  the  smoothness  of  the  solution  increases. 


2  3  436789  1113 

Degree  p 


Figure  5.2b.  Comparison  of  the  performance  of  the  Q(p)  and  Q'(p) 

elements  for  a  ■  2. 


We  have  seen  that  the  difference  among  the  spaces  Q(p).  Q'(p),  Q(Pi(i)> 
Q'(p,q)  differ  only  in  the  different  number  of  internal  shape  functions.  In 
Figure  5.3a,b  we  show  the  error  (Q(p,q) )  and  (Q'(p,q) )  for  q 


Figure  5.3a. 


Influence  of  the  internal  shape  functions  on  the  accuracy 


for  a  ■  1.1  and  p  «  5,  for  the  Q(p)  and  Q'(p)  elements. 


ilM 


increasing,  p  •  5,7  and  a  •  1.1.  The  value  p  *  q  is  marked  in  the 
figure.  This  value  Is  almost  optimal  for  the  Q  element  but  Is  far  from  the 
optimal  for  the  Q'  element.  As  a— the  Influence  of  the  internal  shape 
functions  decreases. 


Figure  5.3b.  Influence  of  the  Internal  shape  functions  on  the  accuracy 
for  a  a  1.1  and  p  >  7  for  the  Q(p)  and  Q'(p)  elements. 

6.  Performance  of  the  Q(p)  and  Q*(p)  elements  In  the  h-p  version  -  f  >  1. 

In  Section  5  we  analyzed  the  case  when  £  >  1,  i.e.,  the  domain  S  on 
which  the  problem  (3.6)-(3.7)  has  been  solved  was  not  partitioned.  Let  us 
assume  now  that  I  >  1,  I.e. ,  S  is  divided  into  ^  squares.  Ue  can  now 
expect  that  the  error  can  be  essentially  estimated  element  by  element.  It 
has  been  seen  that  the  approximation  was  governed  in  the  case  £  >  1  by  the 
parameter  a  with  a-1  being  the  ratio  of  the  distance  of  the  singularity  to 
the  boxmdary  and  size  of  the  element.  (Note  that  S  is  unit  square. )  Denote 

Ca)  m2 

by  Tj  (£)  the  error  when  elements  are  used.  Then  for  I  not  too  large. 
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the  error  is  governed  by  the  "worst"  element,  i.e.,  the  one  closest  to  the 
singularity.  Then  we  can  approximately  expect 

(6.1) 

(sl)  *2 

where  by  t)  (£)  we  denote  the  relative  error  when  f  elements  are  used. 

In  Figure  S.la.b  we  show  the  error  (Q{p)  ,i)  and  (Q'(p)  ,<)  for 

C  3L  3 

a  ■  1.05.  In  addition,  we  show  in  the  figure  the  error  tj  (Q(p),f)  for 
a  »  (a-l)i  +1.  We  see  that  (6.1)  nearly  holds.  For  other  values  of  a  the 
results  are  similar. 


Degree  p  ! 

I 

Figure  6.1a.  Comparison  of  the  performauice  of  the  Q(p)  element  for 
different  a  and  t  leading  approximately  to  the  same  accuracy. 
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Figure  6.1b.  Comparison  of  the  performance  of  the  Q'(p)  element 
for  different  a  and  I  leading  to  approximately  the  same  accuracy. 

(  31 ) 

In  Figure  6.2  we  show  7)  (Q(p,q),f)  for  a  »  1.1,  p  »  5  and  various 

q  and  t.  The  values  (Q(p) ,£)  ■  ■n^*^(Q(p,p),£)  are  marked.  We  see  that 

for  f  >  1  the  character  does  not  change. 
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Degree  q 


Figure  6.2.  Performance  of  the  Q(p,c|)  element  for  a  =  1.1  and  p  =  5. 


7.  Comparison  of  the  performance  of  the  trlauigular  elements  and  Q(p) 
elements. 

So  far  we  have  discussed  square  elements.  Analogously,  the  triangular 
elements  of  degree  p  can  be  constructed  (see  [5]).  Here  the  space  T(p)  of 
the  triangular  element  consists  of  all  polynomials  of  degree  i  p. 

Let  us  consider  once  more  a  square  element.  Dividing  it  into  two 
triangles  by  the  diagonal  and  using  T(p)  elements  we  can  understand  this  as 
a  composite  element  which  has  the  same  degrees  of  freedom  as  the  element 
Q(p). 
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We  can  also  enrich  the  space  T(p)  by  internal  functions  as  in  the  case 
of  square  elements.  The  analysis  of  the  kind  we  have  made  in  Section  4  can 
here  be  used,  too. 

Based  on  this  analysis  we  can  expect  that  the  performance  of  the  Q(p) 
element  is  better  than  of  T(p)  and  the  performance  of  Q(p)  improves  with 
increasing  p  amd  the  smoothness  of  the  approximated  function.  These 
conclusions  are  in  agreement  with  numerical  tests.  In  Figure  7.1  we  show 
(T(p) ,i)  and  t)^®\q(p),£)  for  a  *=  1.05  and  various  p  and  t. 


Figure  7.1.  Comparison  of  the  performance 
of  the  square  and  triangular  element. 
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8.  Additional  considerations. 


It  Is  obvious  that  In  general  It  Is  Impossible  to  prefer  either  element 
Q(p)  or  Q'(p).  The  preference  depends  strongly  on  the  class  of  solutions 
under  consideration.  To  Illustrate  this  let  us  be  Interested  for  simplicity 
In  the  best  approximations  of  u  on  S  In  the  L2-norm  Instead  of  the  H^(S) 
seminorm,  l.e.,  let  us  consider 

Inf  J  (Uq-w)^  dxdy 
S 


where  the  Inflmum  Is  taken  over  all  functions  of  Q(p)  resp.  Q'(p).  In  this 
case  we  will  consider  shape  functions  of  the  form  L^(x)'Lj(y)  where 
Is  the  normalized  Legendre  polynomial  of  degree  J  Then  we  can  write 


(8.1) 


u. 


I 

i.j«i 


and 

(8.2) 


Inf 

W€Q(p) 


J  dxdy  .  Y,  <=?j 

s  l.J>p 


(8.3) 


Inf 

W€Q'(p) 


J  (Ug-w)2  dxdy  -  ^  c^j 

S  1+J>p 


In  Figure  8.1  we  depict  the  pairs  (i,J)  In  the  upper  quarter  plane  by 
bullets;  then  the  error  of  Q(p)  resp.  Q'(p)  Is  the  sum  of  the  squares  of 
all  coefficients  c^j  associated  with  the  bullets  outside  the  square  resp. 
triangle  shown  In  Figure  8.1.  Hence  the  performance  of  the  element  depends  on 
the  distribution  of  c^j  for  the  approximated  functions  resp.  class  of 
functions  under  consideration.  If  for  small  (l.J)  the  coefficients  are 
decaying  slowly  and  are  approximately  of  the  same  magnitude  then  the 
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Figure  8.1.  Schematic  comparison  of  the  Q(p)  aind  Q'(p)  element. 

coefficient  H  introduced  in  (3.11)  will  have  approximately  the  value  Vz. 

We  have  roughly  seen  this  in  our  numerical  experiments  with  a  «  1.05. 

Ue  remark  that  it  is  easy  to  introduce  a  class  of  functions  for  which 
either  Q(p)  or  Q'(p)  elements  are  preferable  for  this  class. 

Although  we  addressed  for  simplicity  the  L^^norm,  the  same  conclusions 
are  the  same  for  the  H^-seminorm. 

9.  Optimal  relation  between  p  and  t. 

The  optimal  relation  between  p  and  t  can  be  based  on  various 
criteria.  In  Figure  9.1  we  show  the  relation  between  the  number  of  degrees  of 

(a  1 

freedom  for  which  v,  (Q(p),£)  ■  c,  a  ■  1.05,  for  various  p  and  e.  With 
this  criterion,  we  see  that  use  of  higher  p  is  preferable. 

In  general,  of  course,  cost  is  not  proportional  to  the  number  of  degrees 
N.  We  can  also  determine  optimality  in  terms  of  the  computational  complexity 
(cost)  needed  to  solve  the  problem  (3.6)  (3.7)  with  an  accuracy  c.  To 
address  this  question,  we  examine  the  CPU  times  required  to  achieve  a 
specified  accuracy  for  the  implementation  of  a  finite  element  solver  described 
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Figure  9.1.  Number  of  degrees  of  freedom  leading  to  the  same 
accuracy  for  the  element  Q(p,^). 

in  [3].  The  algorithm  used  treats  the  unknowns  associated  with  internal  shape 
functions  in  a  manner  akin  to  domain  decomposition,  in  which  these  unknowns 
are  decoupled  from  the  system  using  static  condensation.  The  resulting  global 
interface  problem  is  then  solved  iteratively  using  a  preconditioned  conjxigate 
gradient  algorithm,  with  the  preconditioner  derived  from  the  portion  of  the 
global  stiffness  matrix  associated  with  the  nodal  points,  i.e.,  p  *  1.  See 
[4]  for  an  analysis  of  this  preconditioner.  The  implementation  was  made  on  an 
eight  processor  Alliant  FX/8  parallel  computer,  with  options  to  run  on  fewer 
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than  eight  processors  also  available.  Further  details  about  the  algorithm  and 
Its  Implementation,  as  well  as  a  detailed  study  of  performance  character¬ 
istics,  can  be  foimd  In  [3]. 

Our  concern  Is  the  cost.  In  GHl  time,  required  to  achieve  a  specified 

accuracy.  The  computations  reported  here  correspond  to  choosing  values  of  t 

and  p  that  produce  a  given  accuracy  and  then  solving  the  model  problem 

(3.6)-(3.7)  with  zero  right  hand  side.  In  all  cases,  the  conjugate  gradient 

Iteration  started  with  a  random  Initial  guess  with  entries  between  0  and  1. 

and  the  stopping  criterion  was  that  the  relative  error  in  the  energy  norm  be 

-.3 

less  than  0.5  x  10  (This  is  smaller  thaui  accuracy  considered.)  All 
computations  were  performed  using  double  precision  FORTRAN. 

Table  9.1  shows  the  time  in  seconds  on  one  processor  for  various  I 
that  produces  solution  with  accuracy  of  order  S%  in  the  finite  element 
solution  for  a  «  1.05.  Both  Q(p)  and  Q'(p)  elements  are  considered. 

Table  9.1.  G*U  times  for  various  combinations  of  t  and  p  to  achieve 
approximately  5%  accuracy  for  a  =  1.05. 


Q(p) 

Q'(p) 

B 

D 

Error  % 

time 

B 

B 

Error  % 

time 

13 

5.8 

1.16 

2 

14 

7.0 

2.35 

9 

5.1 

1.41 

4 

10 

4.5 

3.26 

3.0 

1.96 

8 

B 

3.5 

5.00 

4 

6 

5.8 

1.75 

16 

B 

5.0 

6.58 

4 

B 

2.4 

2.58 

8 

B 

5.8 

2.58 

8 

5 

1.6 

4.14 

B 

16 

3 

3.2 

5,73 

B 

Table  9.2  shows  the  analogous  timings  required  for  accuracy  of  order  0.1%. 
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Table  9.2  CPU  times  for  various  combinations  of  t  and  p  to  achieve 
approximately  0.1%  accuracy  for  a  »  1.05. 


Q(p) 

Q'(p) 

n 

B 

Error  % 

time 

B 

B 

Error  % 

time 

4 

13 

17.87 

4 

14 

0.09 

8.96 

8 

8 

14.68 

8 

9 

0.10 

9.33 

8 

9 

0.04 

21.07 

16 

6 

0.09 

13.92 

16 

6 

0.06 

26.10 

We  see  that  for  a  «  l.OS  for  both  5%  accuracy  and  0.1%  accuracy  the  lowest 
cost  occurs  for  smallest  I  and  large  p.  This  is  still  more  pronounced  for 
a  larger.  For  5%  accuracy  the  Q(p)  element  is  less  expensive  than  the 
Q'(p)  element.  In  contrast  the  Q'(p)  element  was  less  expensive  when  0.1% 
accuracy  is  required.  This  is  consistent  with  the  results  of  the  analysis 
made  in  Section  4. 

As  discussed  in  [3]  many  factors  contribute  to  the  cost  of  the  finite 
element  solver,  but  cost  tends  to  be  dominated  by  the  construction  and 
condensation  of  the  local  stiffness  matrices.  For  the  data  in  Tables  9.1  and 
9.2  there  is  one  local  matrix  for  each  of  ^  elements.  For  our  particular 
model  problems  the  local  stiffness  matrices  for  all  elements  are  the  same,  so 
that  considerable  saving  can  be  achieved  by  using  one  local  matrix  for  all  the 
elements.  An  estimate  for  the  cost  of  implementing  the  solver  this  way  is 
obtained  from: 

time  using  1  local  matrix 

<B  total  time  -  (time  for  local  matrix  computations) 

*  (time  for  local  matrix  computations)/^ 
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(Here  in  the  local  matrix  computation  both  the  construction  and  condensation 
is  Included. ) 


These  timings,  for  5%  accuracy  and  0.1%  accuracy,  are  presented  in 
Tables  9.3  and  9.4  respectively  (for  one  processor  computations).  There  Is  In 
general  a  draunatic  drop  In  the  total  cost  required  by  the  finite  element 
solver.  The  optimal  combination  is  also  tilted  to  high  degrees.  The 
comparison  between  the  Q(p]  and  Q'(p)  elements  is  consistent  with  the 
observations  above.  The  Q(p)  element  is  more  efficient  when  low  accuracy  is 
obtained  and  Q'(p)  is  more  efficient  when  high  accuracy  is  required  or  the 
solution  is  smoother.  The  character  of  the  optimal  combination  of  t  aind  p 
seen  here  is  in  agreement  with  the  results  of  an  analysis  of  a  computational 
model  in  [10], 


Table  9.3.  CPU  times  for  various  combinations  of  1  and  p  to  achieve 
approximately  5%  accuracy  for  a  »  1.05  and 
one  local  stiffness  matrix  computation. 


Q(p) 

Q'(p) 

n 

a 

Error  % 

time 

B 

B 

Error  % 

time 

13 

5.8 

1.16 

2 

14 

7.0 

0.92 

9 

5.1 

0.55 

4 

10 

4.5 

0.94 

3.0 

0.74 

8 

B 

3.5 

1.69 

6 

5.8 

0.58 

16 

B 

5.0 

3.07 

B 

2.4 

0.70 

8 

B 

5.8 

1.02 

8 

5 

1.6 

1.26 

B 

16 

3 

3.2 

2.61 

B 

7 
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Table  9.4.  CPU  times  for  various  combinations  of  1  and  p  to  achieve 
approximately  0.1%  accuracy  for  a  »  1.05  and 
one  local  stiffness  matrix  computation. 


Q(p) 

Q'(p) 

B 

B 

Error  X 

time 

B 

B 

Error  X 

time 

4 

13 

2.64 

4 

14 

0.09 

8 

8 

2.68 

8 

9 

0.10 

8 

9 

0.04 

3.34 

16 

6 

0.09 

16 

6 

0.06 

6.51 

mu 

Finally  as  observed  in  [3]  the  element  oriented  computations  required  by 
the  solver  used  (with  repeated  stiffness  matrix  as  in  Table  9.1)  allows  a 
large  amount  of  natural  parallelism  in  the  solution  process.  In  Table  9.5  we 
show  the  CPJ  times  and  speedups  for  several  choices  of  t  and  p  on  multiple 
processors  of  the  Alliant  FX/8  in  comparison  with  computations  reported  in 
Table  9.1.  Ve  see  typical  speedups  on  eight  processors  between  five  and  six. 

Table  9.5  CPU  times  and  speedups  on  multiple  processors 
for  various  combinations  of  t  and  p  that 
produce  approximately  5%  error  for  a  ■  1.05, 


no.  of 
processors 

Q(p) 

Q'Cp) 

B 

B 

time 

speed  up 

B 

B 

time 

speed  up 

8 

4 

7 

ig 

5.06 

4 

10 

0.59 

5.53 

8 

8 

5 

wm 

5.45 

S 

B 

0.87 

5.45 

8 

16 

3 

ig 

4.98 

16 

D 

1.24 

5.31 

4 

2 

10 

0.62 

3.17 

2 

n 

0.71 

3.31 

Parallel  efficiency  is  affected  by  such  factors  as  the  nvimber  of  elements,  the 
amount  of  work  required  per  element  and  the  storage  requirements  of  the  local 


computations.  For  the  data  of  Table  9.5.  the  largest  speedup  appear  when 
t  ^  8.  This  Is  a  consequence  of  the  fact  that  there  is  considerable  overhead 
for  the  relatively  small  number  elements  when  £  »  4  and  for  the  small  amount 
of  work  per  element  when  £  »  16.  See  t3l  for  a  detailed  study  of  such 
effects  on  parallel  implementation. 

The  implementation  we  have  used  is  not  adaptive  and  hence  the  timing  for 
the  prescribed  accuracy  is  not  completely  realistic  because  this  accuracy  is 
not  known  in  advance.  We  can  consider  for  example  the  p-adaptlvity  for 
given  mesh  where  p  is  increased  adaptively.  Various  aspects  of  such 
adaptive  approaches  will  be  discussed  elsewhere. 
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The  Labontoiy  fiir  Nanerkal  Analysis  is  an  integral  part  of  the  Institute  for  Physical 
Science  and  Tedmology  of  the  University  of  Mai^and,  under  the  general  administration  of  the 
Director,  Institute  for  Pl^sical  Sdence  and  Tedmology.  It  has  the  following  goals: 

•  To  conduct  researdi  in  the  mathematical  theory  and  computatioiud  implementation  of 
numerical  anafysis  and  related  tq)ics,  with  emphasis  on  the  numerir^  treatment  of 
linear  and  nonlbear  differential  equations  and  proUems  in  linear  and  nonlinear  algebra. 

•  To  help  bridge  gaps  between  computational  directions  in  engineering,  physics,  etc.,  and 
those  in  the  mathematical  communi^. 

•  To  provide  a  limited  consulting  service  in  all  areas  of  numerical  mathematics  to  the 
University  as  a  sriiole,  and  also  to  government  agendes  and  industries  in  the  State  of 
Mai^and  and  the  Washington  Metropolitan  area. 

•  To  assist  with  the  education  of  num^kal  anatysts,  e^iecially  at  the  postdoctoral  level, 
in  conjunction  with  the  Interdiscqdinaiy  Allied  Mathematics  Program  and  the 
programs  of  the  Mathematics  and  Conqjuter  Sdence  Dq>artments.  This  indudes  active 
collaboration  with  government  agendes  sudi  as  the  National  Institute  of  Standards  and 
Tedmology. 

•  To  be  an  international  center  of  study  and  research  for  foreign  students  in  numerical 
mathematics  ^o  are  supported  fay  foreign  ^emments  or  exchange  agendes 
(Fulbright,  etc.). 

Further  information  may  be  obtained  from  ProTessor  L  BaboSka,  Qiairman,  Laboratory  for 
Numerical  Analysis,  Institute  fcM’  I%ysical  Sdence  and  Tedmology,  University  of  Maryland,  CoUege 
Park,  Maryland  20742-:'.431. 


